Williams County
Selective Masking based Self-Supervised Learning for Image Semantic Segmentation
This paper proposes a novel self-supervised learning method for semantic segmentation using selective masking image reconstruction as the pretraining task. Our proposed method replaces the random masking augmentation used in most masked image modelling pretraining methods. The proposed selective masking method selectively masks image patches with the highest reconstruction loss by breaking the image reconstruction pretraining into iterative steps to leverage the trained model's knowledge. We show on two general datasets (Pascal VOC and Cityscapes) and two weed segmentation datasets (Nassar 2020 and Sugarbeets 2016) that our proposed selective masking method outperforms the traditional random masking method and supervised ImageNet pretraining on downstream segmentation accuracy by 2.9% for general datasets and 2.5% for weed segmentation datasets. Furthermore, we found that our selective masking method significantly improves accuracy for the lowest-performing classes. Lastly, we show that using the same pretraining and downstream dataset yields the best result for low-budget self-supervised pretraining. Our proposed Selective Masking Image Reconstruction method provides an effective and practical solution to improve end-to-end semantic segmentation workflows, especially for scenarios that require limited model capacity to meet inference speed and computational resource requirements.
- North America > Canada > Saskatchewan > Saskatoon (0.04)
- North America > United States > North Dakota > Williams County (0.04)
- Transportation > Ground (0.46)
- Health & Medicine > Diagnostic Medicine > Imaging (0.46)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > United States > North Dakota > Williams County (0.04)
- (6 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.92)
- Instructional Material (0.67)
- Information Technology (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (0.93)
- Energy (0.92)
- (5 more...)
SPOT: Single-Shot Positioning via Trainable Near-Field Rainbow Beamforming
Cai, Yeyue, Mo, Jianhua, Tao, Meixia
Abstract--Phase-time arrays, which integrate phase shifters (PSs) and true-time delays (TTDs), have emerged as a cost-effective architecture for generating frequency-dependent rainbow beams in wideband sensing and localization. This paper proposes an end-to-end deep learning-based scheme that simultaneously designs the rainbow beams and estimates user positions. Treating the PS and TTD coefficients as trainable variables allows the network to synthesize task-oriented beams that maximize localization accuracy. A lightweight fully connected module then recovers the user's angle-range coordinates from its feedback of the maximum quantized received power and its corresponding subcarrier index after a single downlink transmission. Compared with existing analytical and learning-based schemes, the proposed method reduces overhead by an order of magnitude and delivers consistently lower two-dimensional positioning error .
- North America > United States > Utah > Uintah County (0.41)
- North America > United States > North Dakota > Williams County (0.41)
- North America > United States > Montana > Richland County (0.41)
- (2 more...)
Predicting Oscar-Nominated Screenplays with Sentence Embeddings
Oscar nominations are an important factor in the movie industry because they can boost both the visibility and the commercial success. This work explores whether it is possible to predict Oscar nominations for screenplays using modern language models. Since no suitable dataset was available, a new one called Movie-O-Label was created by combining the MovieSum collection of movie scripts with curated Oscar records. Each screenplay was represented by its title, Wikipedia summary, and full script. Long scripts were split into overlapping text chunks and encoded with the E5 sentence em bedding model. Then, the screenplay embed dings were classified using a logistic regression model. The best results were achieved when three feature inputs related to screenplays (script, summary, and title) were combined. The best-performing model reached a macro F1 score of 0.66, a precision recall AP of 0.445 with baseline 0.19 and a ROC-AUC of 0.79. The results suggest that even simple models based on modern text embeddings demonstrate good prediction performance and might be a starting point for future research.
- North America > United States > California (0.14)
- Europe > Germany > Bavaria > Regensburg (0.05)
- North America > United States > North Dakota > Williams County (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > United States > North Dakota > Williams County (0.04)
- (6 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.92)
- Instructional Material (0.67)
- Media > Film (1.00)
- Information Technology (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (0.93)
- (5 more...)
On the transferability of Sparse Autoencoders for interpreting compressed models
Gupte, Suchit, Chhabra, Vishnu Kabir, Khalili, Mohammad Mahdi
Modern LLMs face inference efficiency challenges due to their scale. To address this, many compression methods have been proposed, such as pruning and quantization. However, the effect of compression on a model's interpretability remains elusive. While several model interpretation approaches exist, such as circuit discovery, Sparse Autoencoders (SAEs) have proven particularly effective in decomposing a model's activation space into its feature basis. In this work, we explore the differences in SAEs for the original and compressed models. We find that SAEs trained on the original model can interpret the compressed model albeit with slight performance degradation compared to the trained SAE on the compressed model. Furthermore, simply pruning the original SAE itself achieves performance comparable to training a new SAE on the pruned model. This finding enables us to mitigate the extensive training costs of SAEs.
- North America > United States > North Dakota > Williams County (0.04)
- North America > United States > Ohio > Franklin County > Columbus (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > India (0.04)
Sugar-Beet Stress Detection using Satellite Image Time Series
Sadbhave, Bhumika Laxman, Vaeth, Philipp, Dejon, Denise, Schorcht, Gunther, Gregorová, Magda
Satellite Image Time Series (SITS) data has proven effective for agricultural tasks due to its rich spectral and temporal nature. In this study, we tackle the task of stress detection in sugar-beet fields using a fully unsupervised approach. We propose a 3D convolutional au-toencoder model to extract meaningful features from Sentinel-2 image sequences, combined with acquisition-date-specific temporal encodings to better capture the growth dynamics of sugar-beets. The learned representations are used in a downstream clustering task to separate stressed from healthy fields. The resulting stress detection system can be directly applied to data from different years, offering a practical and accessible tool for stress detection in sugar-beets.
- North America > United States > North Dakota > Williams County (0.30)
- Europe > Germany > Bavaria > Lower Franconia > Würzburg (0.05)
- Europe > Poland (0.04)
- Europe > France (0.04)
Solar-powered robot zaps weeds without chemicals
Out in the California sun, a new kind of farmhand is hard at work. Powered by solar energy and guided by artificial intelligence, the solar-powered weeding robot for cotton fields is offering farmers a smarter and more sustainable way to tackle weeds. This technology is arriving just in time, as growers across the country face a shortage of available workers and weeds that are becoming increasingly resistant to herbicides. Sign up for my FREE CyberGuy Report Get my best tech tips, urgent security alerts and exclusive deals delivered straight to your inbox. Plus, you'll get instant access to my Ultimate Scam Survival Guide -- free when you join my CYBERGUY.COM/NEWSLETTER JOB-KILLING ROBOT LEARNS AT WORK, AND IT'S COMING TO THE FACTORY FLOOR Farmers everywhere are facing a tough reality.
- North America > United States > California (0.27)
- Europe > United Kingdom > North Sea > Southern North Sea (0.27)
- North America > United States > North Dakota > Williams County (0.05)
MINT: Multimodal Instruction Tuning with Multimodal Interaction Grouping
Shan, Xiaojun, Cao, Qi, Han, Xing, Yu, Haofei, Liang, Paul Pu
Recent advances in multimodal foundation models have achieved state-of-the-art performance across a range of tasks. These breakthroughs are largely driven by new pre-training paradigms that leverage large-scale, unlabeled multimodal data, followed by instruction fine-tuning on curated labeled datasets and high-quality prompts. While there is growing interest in scaling instruction fine-tuning to ever-larger datasets in both quantity and scale, our findings reveal that simply increasing the number of instruction-tuning tasks does not consistently yield better performance. Instead, we observe that grouping tasks by the common interactions across modalities, such as discovering redundant shared information, prioritizing modality selection with unique information, or requiring synergistic fusion to discover new information from both modalities, encourages the models to learn transferrable skills within a group while suppressing interference from mismatched tasks. To this end, we introduce MINT, a simple yet surprisingly effective task-grouping strategy based on the type of multimodal interaction. We demonstrate that the proposed method greatly outperforms existing task grouping baselines for multimodal instruction tuning, striking an effective balance between generalization and specialization.
- Europe > Switzerland > Zürich > Zürich (0.14)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > North Dakota > Williams County (0.04)
- (9 more...)
- Information Technology (1.00)
- Health & Medicine (1.00)
- Energy (0.68)
- (3 more...)
SemanticSugarBeets: A Multi-Task Framework and Dataset for Inspecting Harvest and Storage Characteristics of Sugar Beets
Croonen, Gerardus, Trondl, Andreas, Simon, Julia, Steininger, Daniel
While sugar beets are stored prior to processing, they lose sugar due to factors such as microorganisms present in adherent soil and excess vegetation. Their automated visual inspection promises to aide in quality assurance and thereby increase efficiency throughout the processing chain of sugar production. In this work, we present a novel high-quality annotated dataset and two-stage method for the detection, semantic segmentation and mass estimation of post-harvest and post-storage sugar beets in monocular RGB images. We conduct extensive ablation experiments for the detection of sugar beets and their fine-grained semantic segmentation regarding damages, rot, soil adhesion and excess vegetation. For these tasks, we evaluate multiple image sizes, model architectures and encoders, as well as the influence of environmental conditions. Our experiments show an mAP50-95 of 98.8 for sugar-beet detection and an mIoU of 64.0 for the best-performing segmentation model.
- Food & Agriculture > Agriculture (0.69)
- Health & Medicine (0.46)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)
- Information Technology > Artificial Intelligence > Vision (0.70)